Uber API Design Evaluation and Latency Budget

Learn how we meet the non-functional requirements and estimate the response time of the Uber APIs.

Introduction#

We've discussed the design considerations and API model for the functional requirements of our Uber service. In this lesson, we'll cover different approaches to achieve our non-functional requirements and estimate the response time of our Uber services.

Non-functional requirements#

The non-functional requirements for our APIs are availability, low latency, scalability, and security. Let's understand how we can achieve our requirements.

Availability#

We ensure the availability of the services by decoupling the services. For example, the driver service continuously works and keeps the up-to-date location of drivers if the rider service goes down temporarily. The availability of our services also depends on the supporting services. For example, if Google Maps is unavailable, we have other alternate maps services (such as MapQuest and Waze), although these services may not support all the features that Google Maps offers. Our service supports multiple payment methods. If one payment service is down, then we have other payment methods, such as Uber balance or simply paying in cash. Moreover, we prioritize requests such as ongoing trips, payments, etc., and use rate limiting at the API gateway for other requests to prevent the services from being overloaded by the user's requests. If the service is down or overloaded by the requests, multiple replicas of the services are utilized to ensure availability.

Note: Uber may also facilitate integration with local payment gateways depending on the region of the service. Therefore, the Uber service will not go down because of the supporting service of payment gateways.

Scalability#

Since most of the communication in our design happens through the pub-sub service, we created multiple replicas of the service to avoid SPOF. We can use different instances of pub-sub service for an unrelated combination of riders/drivers to distribute the load. This also allows us to decouple services and enhance the scalability of the API. The stateless nature of the request for the services allows us to replicate the services to forward the requests to any available server. However, the scalability of our services also depends on the scalability of the supporting services. For example, if any supporting service has scaling issues, it could become a bottleneck for our service. Services like Google Maps are highly scalable and use CDNs to reduce the risk of disruption by serving static data specific to a customer's region, so there’s little chance that such supporting services will affect our service.

Point to Ponder

Question

Is it possible to load balance client requests while maintaining client sessions?

Hide Answer

Yes, it is possible to load balance incoming client requests while maintaining the client’s sessions on the server side. We can use the following most common techniques to achieve this purpose:

  • Sticky sessions: In sticky sessions, the load balancer (API gateway) uses the URL, session ID, and other information to identify the original server and redirect the request to the same server.

  • Shared sessions: In the shared session approach, the session is stored in some common storage (memory, cache, etc.) to allow the server to seamlessly handle requests from different clients with different session IDs.

Note: While a session is being maintained, the response time of a request will most likely increase compared to a normal request (a request without a session.)

Security#

Our API allows only authenticated users to use the resources. Users can log in to Uber using a legitimate email address or phone number and use the verification code to verify the entered credentials. Moreover, we use encrypted tokens (issued only if the person is in the trusted contact list of the rider) while sharing the trip details with others to ensure the details are shared with the desired person. We don’t allow direct interaction between support services and client devices because clients can manipulate trip details (inaccurate distances, wait times, and fair estimates) by directly requesting the supporting services.

We allow the third-party login only through OAuth 2.0 and OIDC using an authorization code and proof key for code exchange (PKCE) flow to obtain a third-party access token. Access tokens mitigate the risk of data leakage while logging in with third-party applications. Additionally, we transport all data in encrypted form using a secure transport layer protocol such as TLS 1.3 so that attackers can’t access the user's personal information.

Low latency#

The driver service frequently receives updates about the driver's location and stores this information in memory to instantly find nearby drivers. We also pre-estimate the fare for regularly visited places of clients who use Uber daily and cache them to quickly respond to the ride request. The persistent connection used between the driver and driver services helps us reduce network latency. Furthermore, we use pagination for the trip history to get a paginated response.

Note: The pre-estimated fare is only an estimate that appears when choosing a different vehicle type, while the actual fare is calculated when the ride request is sent. If there is a significant difference, Uber will prompt the rider to accept the fare before notifying the drivers.

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches


Availability

  • Couples services loosely
  • Uses alternate services for supporting services
  • Uses rate limiting to avoid the service being overloaded
  • Uses replicas if any origin service is down


Scalability

  • Uses replication of the services to avoid SPOF and improve scalability
  • Load balance incoming requests due to the stateless nature of HTTP
  • Uses CDNs to serve static data (map tiles, etc.) specific to the client's region


Security

  • Uses access tokens to share trip details
  • Uses authentication and authorization through the phone number
  • Funnels API calls to supporting services through back-end servers
  • Uses OAuth/OIDC token with PKCE mechanism for third-party access
  • Uses TLS 1.3 for end-to-end encryption of the data


Low latency

  • Caches frequently requested rides estimates
  • Uses pagination to get history quickly
  • Uses persistent connection between the driver and driver service

Latency budget#

In this section, we'll estimate the response time of Uber APIs. Various APIs are coordinating under the hood of the whole Uber system. Let's divide this section by the type of request (GET and POST) sent and estimate the response time to achieve functionality. Let's start by estimating request and response sizes and calculating response times at the end of the lesson.

As discussed in the back-of-the-envelope latency calculations, the latency of the GET and POST requests are affected by two different parameters. In the case of GET, the average RTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies by 0.4 ms0.4\ ms per KB. Similarly, for POST requests, the RTT time changes with the data size by 1.15 ms1.15\ ms per KB after the base RTT time, which was 260 ms.

Request and response size#

We’ll estimate the response time for the POST method to book a ride and GET method to get ride history. The size of each request is estimated below:

Booking a ride#

  • Request size:  Assume the request size is about 800 bytes, which includes vehicle type, source, destination addresses, etc. The total request size, including headers, is approximately 1.5 KB.

  • Response size: The response to a POST request for booking a ride is approximately 1 KB.

We'll only consider the request size as per our convention because the response is a standard 1 KB, and only the request size affects the response time in the case of POST. The overall response time can be calculated as follows:

Response Time Calculator for Booking a Ride

Enter the size in KBs1.5KB
Minimum latencyf382.625ms
Maximum latencyf463.625ms
Minimum response timef386.625ms
Maximum response timef467.625ms

Assuming the request size is 1.5 KBs:

Timelatency=Timebase+RTTpost+DownloadTime_{latency} = Time_{base} + RTT_{post} + Download

RTTpost=RTTbase+1.15×Size=260 ms+1.15 ms×1.5 KBsRTT_{post} = RTT_{base}+1.15\times Size = 260\ ms + 1.15\ ms\times 1.5\ KBs

Timelatency_min=Timebase_min+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_min} = Time_{base\_min} + (RTT_{base} + 1.15 \times size\ of\ request\ (KBs)) + 0.4

=120.5+(260+1.15×1.5)+0.4=382.625 ms= 120.5 + (260 + 1.15 \times 1.5) + 0.4 = 382.625\ ms

Timelatency_max=Timebase_max+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_max} = Time_{base\_max} + (RTT_{base} + 1.15\times size\ of\ request\ (KBs)) + 0.4

=201.5+260+1.15×1.5+0.4=463.625 ms= 201.5 + 260 + 1.15 \times 1.5 + 0.4 = 463.625\ ms

Similarly, the response time is calculated as:

TimeResponse_min=Timelatency_min+Timeprocessing_min=382.625 ms+4 ms=386.625 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 382.625\ ms + 4\ ms = 386.625\ ms

TimeResponse_max=Timelatency_max+Timeprocessing_max=463.625 ms+4 ms=467.625 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 463.625\ ms + 4\ ms = 467.625\ ms

Get rides history#

  • Request size: The request size is approximately 1 KB for a GET method because the request body is empty.

  • Response size: Let's assume we have a size of approximately 1.5 KBs for each ride record. Suppose each request returns eight records then the response size of this GET request would be approximately 13 KBs (including 1 KB for the header).

Since this is a standard GET request, we’ll only consider response size, which affects response time. Here’s the response time calculation:

Response Time Calculator for Get Rides History

Enter the size in KBs13KB
Minimum latencyf195.7ms
Maximum latencyf276.7ms
Minimum response timef199.7ms
Maximum response timef280.7ms

Assuming the response size is 13 KBs, then the latency is calculated by:

Timelatency_min=Timebase_min+RTTget+0.4×size of response (KBs)=120.5+70+0.4×13=195.7 msTime_{latency\_min} = Time_{base\_min} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 120.5 + 70 + 0.4 \times 13 = 195.7\ ms

Timelatency_max=Timebase_max+RTTget+0.4×size of response (KBs)=201.5+70+0.4×13=276.7 msTime_{latency\_max} = Time_{base\_max} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 201.5 + 70 + 0.4 \times 13 = 276.7\ ms

Similarly, the response time is calculated using the following equation:

TimeResponse=Timelatency+TimeprocessingTime_{Response} = Time_{latency}+ Time_{processing}

Now, for minimum response time, we use minimum values of base time and processing time:

TimeResponse_min=Timelatency_min+Timeprocessing_min=195.7 ms+4 ms=199.7 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 195.7\ ms + 4\ ms = 199.7\ ms

Now, for maximum response time, we use maximum values of base time and processing time:

TimeResponse_max=Timelatency_max+Timeprocessing_max=276.7 ms+4 ms=280.7 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 276.7\ ms + 4\ ms = 280.7\ ms

A summary of the overall latency budget for GET and POST requests of Uber is shown in the illustration below:

The Uber API processing time in terms of GET and POST requests
The Uber API processing time in terms of GET and POST requests

While the response time of a POST request may seem high at first glance, the response time of services like Uber can tolerate up to one second of delay. Therefore, we can consider the estimates above to be as good numbers.

Note: For simplicity, we have not included the time spent interacting with supporting services in the response times above. Although, the time spent on service-to-service interaction is much less than the time spent communicating with end users, it adds delay and leads to increased response times.

Summary#

In this chapter, we learned about designing an efficient API for a transport service like Uber. We discussed the key design factors and discussed the decisions we made. We further provided the request and response format of the messages exchanged with endpoints that play an important role in the overall flow of the Uber service. Finally, we estimated response times to achieve near-real-time communication for the Uber service.

API Model for Uber Service

Requirements of the CamelCamelCamel API